An Optimization Scheme in MapReduce for Reduce Stage
نویسندگان
چکیده
As a widely used programming model for the purposes of processing large data sets, MapReduce (MR) becomes inevitable in data clusters or grids, e.g. a Hadoop environment. Load balancing as a key factor affecting the performance of map resource distribution, has recently gained high concerns to optimize. Current MR processes in the realization of distributed tasks to clusters use hashing with random modulo operations, which can lead to uneven data distribution and inclined loads, thereby obstruct the performance of the entire distribution system. In this paper, a virtual partition consistent hashing (VPCH) algorithm is proposed for the reduce stage of MR processes, in order to achieve such a trade-off on job allocation. Besides, experienced programmers are needed to decide the number of reducers used during the reduce phase of the MR, which makes the quality of MR scripts differ. So, an extreme learning method is employed to recommend potential number of reducer a mapped task needs. Execution time is also predicted for user to better arrange their tasks. According to the results, VPCH can lead to load balancing and our prediction model can provide fast prediction than SVM with similar accuracy maintained.
منابع مشابه
Adaptive Dynamic Data Placement Algorithm for Hadoop in Heterogeneous Environments
Hadoop MapReduce framework is an important distributed processing model for large-scale data intensive applications. The current Hadoop and the existing Hadoop distributed file system’s rack-aware data placement strategy in MapReduce in the homogeneous Hadoop cluster assume that each node in a cluster has the same computing capacity and a same workload is assigned to each node. Default Hadoop d...
متن کاملClustering Social Images with MapReduce and High Performance Collective Communication
Social Image clustering is a data intensive application that provides novel challenges to high performance computing. Already this field has reached 10-100 million images represented as points in a high dimensional (up to 2048) vector space that are to be divided into up to 1-10 million clusters. In recent years MapReduce has become popular in processing big data problems due to its attractive ...
متن کاملSEISMIC DESIGN OPTIMIZATION OF STEEL STRUCTURES BY A SEQUENTIAL ECBO ALGORITHM
The objective of the present paper is to propose a sequential enhanced colliding bodies optimization (SECBO) algorithm for implementation of seismic optimization of steel braced frames in the framework of performance-based design (PBD). In order to achieve this purpose, the ECBO is sequentially employed in a multi-stage scheme where in each stage an initial population is generated based on the ...
متن کاملCoefficient of Performance Optimization of a Single Stage Thermoelectric Cooler
In thermoelectric coolers (TECs) applied external voltage potential is generated to a temperature difference based on the Peltier effect. Main and basic structure of TECs is in the form of single stage device. Due to the low efficiency, especially low coefficient of performance (COP) of thermoelectric coolers, optimal design of geometrical parameters of such devices is vital. For this purpose, ...
متن کاملAdaptive Preshuffling in Hadoop Clusters
MapReduce has become an important distributed processing model for large-scale data-intensive applications like data mining and web indexing. Hadoop–an open-source implementation of MapReduce is widely used for short jobs requiring low response time. In this paper, We proposed a new preshuffling strategy in Hadoop to reduce high network loads imposed by shuffle-intensive applications. Designing...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2016